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Abstract 



The relationship between short-term exposure to air pollution and mortality 
or morbidity has been the subject of much recent research, in which the stan- 
dard method of analysis uses Poisson linear or additive models. In this paper 
we use a Bayesian dynamic generalised linear model (DGLM) to estimate this 
relationship, which allows the standard linear or additive model to be extended 
in two ways: (i) the long-term trend and temporal correlation present in the 
health data can be modelled by an autorcgrcssive process rather than a smooth 
function of calendar time; (ii) the effects of air pollution are allowed to evolve 
over time. The efficacy of these two extensions are investigated by applying 
a series of dynamic and non-dynamic models to air pollution and mortality 
data from Greater London. A Bayesian approach is taken throughout, and a 
Markov chain montc carlo simulation algorithm is presented for inference. An 
alternative likelihood based analysis is also presented, in order to allow a direct 
comparison with the only previous analysis of air pollution and health data 
using a DGLM. 

Key words dynamic generalised linear model, Bayesian analysis, Markov 
chain monte carlo simulation, air pollution 
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1 Introduction 



The detrimental health effects associated with short-term exposure to air pollution 
is a major issue in public health, and the subject has received a great deal of at- 
tention in recent years. A number of epidemiological studies have found positive 
associations between common pollutants, such as particulate matter (measured as 
PMio), ozone or carbon monoxide, and mortality or morbidity, with many of these 
associations relating to pollution levels below existing guidelines and standards (see, 
for example, Dominici et al. (2002), Vedal et al. (2003) or Roberts (2004)). These 
associations have been estimated from single-site and multi-city studies, the latter of 
which include 'Air pollution and health: a European approach' (APHEA) (Zmirou 
et al. (1998)) and 'The National Morbidity, Mortality, and Air Pollution Study' 
(NMMAPS) (Samet et al. (2000)). Although these studies have been conducted 
throughout the world in a variety of climates, positive associations have been con- 
sistently observed. The majority of these associations have been estimated using 
time series regression methods, and as the health data are only available as daily 
counts, Poisson generalised linear and additive models are the standard method of 
analysis. These data relate to the number of mortality or morbidity events that 
arise from the population living within a fixed region, for example a city, and are 
collected at daily intervals. Denoting the number of health events on day t hy yt, 
the standard log-linear model is given by 



in which the natural log of the expected health counts is linearly related to air pol- 
lution levels wt and a vector of r covariates, zj = {zti, ■ ■ ■ , ztr)- The covariates 
model any seasonal variation, long-term trends, and temporal correlation present in 
the health data, and typically include smooth functions of calendar time and me- 
teorological variables, such as temperature. If the smooth functions are estimated 
parametrically using regression splines, the model is linear, where as non-parametric 
estimation using smoothing splines, leads to an additive model. 

In this paper we investigate the efficacy of using Bayesian dynamic generalised linear 
models (DGLMs, West et al. (1985) and Fahrmeir and Tutz (2001)) to analyse air 
pollution and health data. Dynamic generalised linear models extend generalised 
linear models by allowing the regression parameters to evolve over time via an au- 
toregressive process of order p, denoted AR(p). The autoregressive nature of such 
models suggest two changes from the standard model (1) described above. Firstly, 
long-term trends and temporal correlation, present in the health data, can be mod- 
elled with an autoregressive process, which is in contrast to the standard approach 
of using a smooth function of calendar time. Secondly, the effects of air pollution 
can be modelled with an autoregressive process, which allows these effects to evolve 
over time. This evolution may be due to a change in the composition of individual 
pollutants, or because of a seasonal interaction with temperature. This is a compar- 
atively new area of research, for which Peng et al. (2005) and Chiogna and Gaetan 
(2002) are the only known studies in this setting. The first of these forces the ef- 
fects to follow a fixed seasonal pattern, which does not allow any other temporal 




Poisson (^t) for i = 1, . . . 



n. 
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variation, such as a long-term trend. In contrast, Chiogna and Gaetan (2002) model 
this evolution as a first order random walk, which does not fix the temporal shape 
a-priori, allowing it to be estimated from the data. Their work is the only known 
analysis of air pollution and health data using DGLMs, and they implement their 
model in a likelihood framework using the Kalman filter. In this paper we present 
a Bayesian analysis based on Markov chain monte carlo (MCMC) simulation, which 
we believe is a more natural framework in which to analyse hierarchical models of 
this type. 

The remainder of this paper is organised as follows. Section 2 introduces the 
Bayesian DGLM proposed here, and compares it to the likelihood based approach 
used by Chiogna and Gaetan (2002). Section 3 describes a Markov chain monte 
carlo estimation algorithm for the proposed model, while section 4 discusses the ad- 
vantages of dynamic models for these data in more detail. Section 5 presents a case 
study, which investigates the utility of dynamic models in this context by analysing 
data from Greater London. Finally, section 6 gives a concluding discussion and 
suggests areas for future work. 

2 Bayesian Dynamic generalised linear models 

A Bayesian dynamic generalised linear model extends a generalised linear model by 
allowing a subset of the regression parameters to evolve over time as an autoregres- 
sive process. The general model proposed here begins with a Poisson assumption 
for the health data and is given by 



yt ~ Poisson(/_ff) for t = 1, . . . ,n, 
ln{nt) = ^l:Pt + zja, 

Pt = Fi/3,_i + . . . + + ivt~N(0,S^), 

/3o,---,/3-p+i ~ N(/Xo,So), (2) 
a ~ N(/^q.,Sq,), 

Inverse- Wishart (ns, "S*^""^)- 

The vector of health counts are denoted by y = (yi, . . . ,yn)Jxii ^^'^ covariates 
include an r x 1 vector zj, with fixed parameters a = (ai, . . . , ar)Jxi' ^^"^ a q x 1 
vector xt, with dynamic parameters (3^ = {f^ti, • • • > Aij)gxi- '^^^ dynamic parameters 
are assigned an autoregressive prior of order p, which is initialised by starting pa- 
rameters {P_p_^_i, . . . , (3q) at times (— p+ 1, . . . , 0). Each initialising parameter has a 
Gaussian prior with mean Ato^xi variance ^o^^^, and are included to allow (3i to 
follow an autoregressive process. The autoregressive parameters can be stacked into 
a single vector denoted by /3 = (/3_p+i, . . . , (3^, (3i, . . . , Pn)Jn+p)qxi^ ^^'^ ^he vari- 
ability in the process is controlled hy a q x q variance matrix S^, which is assigned 
a conjugate inverse- Wishart prior. For univariate processes is scalar, and the 
conjugate prior simplifies to an inverse-gamma distribution. The evolution and sta- 
tionarity of this process are determined by and the qx q autoregressive matrices 
F = {Fi, . . . ,Fp}, the latter of which may contain unknown parameters or known 
constants, and the prior specification depends on its form. For example, a univariate 
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first order autoregressive process is stationary if \Fi\ < 1, and a prior specification is 
discussed in section 3.1.4. A Gaussian prior is assigned to a because prior informa- 
tion is simple to specify in tliis form. The unknown parameters are (/3,a,S^) and 
components of, F, wliereas tlie liyperparameters (/x^^, Sq, n^, Ss, fJ-o, Sq) are known. 

2.1 Estimation for a DGLM 

We propose a Bayesian implementation of (2) using MCMC simulation, because it 
provides a natural framework for inference in hierarchical models. However, numer- 
ous alternative approaches have also been suggested, and a brief review is given here. 
West et al. (1985) proposed an approximate Bayesian analysis based on relaxing the 
normality of the AR(1) process, and assuming conjugacy between the data model 
and the AR(1) parameter model. They use Linear Bayes methods to estimate the 
conditional moments of (3^, while estimation of is circumvented using the discount 
method (Ameen and Harrison (1985)). Fahrmeir and co-workers (see Fahrmeir and 
Kaufmann (1991), Fahrmeir (1992) and Fahrmeir and Wagenpfeil (1997)) propose a 
likelihood based approach, which maximises the joint likelihood /(/3|y). They use 
an iterative algorithm that simultaneously updates f3 and S^, using the iteratively 
re-weighted Kalman filter and smoother, and expectation-maximisation (EM) algo- 
rithm (or generalised cross validation). This is the estimation approach taken by 
Chiogna and Gaetan (2002), and a comparison with our Bayesian implementation 
is given below. Other approaches to estimation include approximating the posterior 
density by piecewise linear functions (Kitagawa (1987)), using numerical integration 
methods (Fruhwirth-Schnatter (1994)), and particle filters Kitagawa (1996). 

2.2 Comparison with the hkehhood based approach 

The main difference between this work and that of Chiogna and Gaetan (2002), who 
also used a DGLM in this setting, is the approach taken to estimation and inference. 
We propose a Bayesian approach with analysis based on MCMC simulation, which 
we believe has a number of advantages over the likelihood based analysis used by 
Chiogna and Gaetan. In a Bayesian approach the posterior distribution of (3 cor- 
rectly allows for the variability in the hyperparameters, while confidence intervals 
calculated in a likelihood analysis do not. In a likelihood analysis (S^,F) are es- 
timated by data driven criteria, such as generalised cross validation, and estimates 
and standard errors of f3 are calculated assuming (S^, F) are fixed at their estimated 
values. As a result, confidence intervals for f3 are likely to be too narrow, which may 
lead to a statistically insignificant effect of air pollution appearing to be significant. 
In contrast, the Bayesian credible intervals are the correct width, because (/3, S^, F) 
are simultaneously estimated within the MCMC algorithm. 

The Bayesian approach allows the investigator to incorporate prior knowledge of 
the parameters into the model, whilst results similar to a likelihood analysis can be 
obtained by specifying prior ignorance. This is particularly important in dynamic 
models because the regression parameters are likely to evolve smoothly over time, 
and a non- informative prior for may result in the estimated parameter process 
being contaminated with unwanted noise. Such noise may hide a trend in the pa- 
rameter process, and can be removed by specifying an informative prior for S^. The 
Bayesian approach is the natural framework in which to view hierarchical models 
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of this type, because it can incorporate variation at multiple levels in a straight- 
forward manner, whilst making use of standard estimation techniques. In addition, 
the full posterior distribution can be calculated, whereas in a likelihood analysis 
only the mode and variance are estimated. However, as with any Bayesian analysis 
computation of the posterior distribution is time consuming, and likelihood based 
estimation is quicker to implement. To assess the relative performance of the two 
approaches, we apply all models in section 5 using the Bayesian algorithm described 
in section 3 and the likelihood based alternative used by Chiogna and Gaetan (2002). 

The model proposed above is a re-formulation of that used by Chiogna and Gaetan 
(shown in equation (3) below), which fits naturally within the Bayesian framework 
adopted here. Apart from the inclusion of prior distributions in the Bayesian ap- 
proach, there are two major differences between the two models, the first of which 
is operational and the second is notational. Firstly, a vector of covariates with fixed 
parameters (ex) is explicitly included in the linear predictor, which allows the fixed 
and dynamic parameters to be updated separately in the MCMC simulation algo- 
rithm. This enables the autoregressive characteristics of (3 to be incorporated into its 
Metropolis-Hastings step, without forcing the same autoregressive property onto the 
simulation of the fixed parameters. This would not be possible in (3) as covariates 
with fixed parameters are included in the AR(1) process by a particular specification 
of and F (diagonal elements of are zero and F are one). This specification 
is also inefficient because n copies of each fixed parameter are estimated. Secondly, 
at first sight (3) appears to be an AR(1) process which compares with our more 
general AR(p) process. In fact an AR(p) process can be written in the form of (3) 
by a particular specification of (/3, F), but we believe the approach given here is 
notationally clearer. In the next section we present an MCMC simulation algorithm 
for carrying out inference within this Bayesian dynamic generalised linear model. 



yt ^ Poisson(/Zf) for t = 1, . . . ,n 

Pt = Fpt^^+ut i.t~N(0,S^) (3) 
/3o ~ N(/io,So) 

3 MCMC estimation algorithm 

The joint posterior distribution of (/3, a, S^, F) in (2) is given by 



/(/3,a,S^,F|y) oc /(y|/3, «, S^, F)/(«)/(/3|S^, F)/(S^)/(F) 

n n 

= J] Poisson(yt|/3i, a)N(a|/i,, S,) J] N(/3jFi/3i_i + . . . + Fpp^_^, S^) 
t=i t=i 

X N(/3_p+i|/^o> 5]o) • • • N(/3olAto> Sq) Inverse- Wishart(S^|nQ, 5q1)/(F), 



where f{F) depends on the form of the AR(p) process. The next section describes 
the overall simulation algorithm, with specific details given in 3.1.1 - 3.1.4. 
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3.1 Overall simulation algorithm 

The parameters are updated using a block Metropolis-Hastings algorithm, in which 
starting values {f3^^\ a^^\T,^p \ F^^"^) are generated from overdispersed versions of 
the priors (for example t-distributions replacing Gaussian distributions). The pa- 
rameters are alternately sampled from their full conditional distributions in the 
following blocks. 

(a) Dynamic parameters /3 = (/3„p+i, . . . 
Further details are given in Section (3.1.1). 

(b) Fixed parameters a = (ai, . . . , a,-). 
Further details are given in Section (3.1.2). 

(c) Variance matrix S^. 

Further details are given in section (3.1.3). 

(d) AR(p) matrices, F = {Fi, . . . ,Fp) (or components of). 
Further details are given in section (3.1.4). 

3.1.1 Sampling from /(/3|y, a, S^, F) 

The full conditional of /3 is the product of n Poisson observations and a Gaussian 
AR(j)) prior given by 

n n 

/(/3|y,«,S^,F) oc llFoissoniyt\Pt,a)l[N{(3t\FiPt_, + ... + Fp(3,^p,J:^) 
t=i t=i 
X N(/3_p+i|/Xo,So)---N(/3o|/io,So). 

The full conditional is non-standard, and a number of simulation algorithms have 
been proposed that take into account the autoregressive nature of f3. Fahrmeir 
et al. (1992) combine a rejection sampling algorithm with a Gibbs step, but report 
acceptance rates that are very low making the algorithm prohibitively slow. In con- 
trast, Shephard and Pitt (1997) and Gamerman (1998) suggest Metropolis-Hastings 
algorithms, in which the proposal distributions are based on Fisher scoring steps 
and Taylor expansions respectively. However, such proposal distributions are com- 
putationally expensive to calculate, and the conditional prior proposal algorithm 
of Knorr-Held (1999) is used instead. His proposal distribution is computationally 
cheap to calculate, compared with those of Shephard and Pitt (1997) and Gamer- 
man (1998), while the Metropolis-Hastings acceptance rate has a simple form and 
is easy to calculate. Further details are given in appendix A. 

3.1.2 Sampling from /(ctjy, /3, 11/3, F) 

The full conditional of ct is non-standard because it is the product of a Gaussian 
prior and n Poisson observations. As a result, simulation is carried out using a 
Metropolis-Hastings step, and two common choices are random walk and Fisher 
scoring proposals (for details see Fahrmeir and Tutz (2001)). A random walk pro- 
posal is used here because of its computational cheapness compared with the Fisher 
scoring alternative, and the availability of a tuning parameter. The parameters are 
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updated in blocks, which is a compromise between the high acceptance rates ob- 
tained by univariate sampling, and the improved mixing that arises when large sets 
of parameters are sampled simultaneously. Proposals are drawn from a Gaussian 
distribution with mean equal to the current value of the block and a diagonal vari- 
ance matrix. The diagonal variances are typically identical and can be tuned to give 
good acceptance rates. 

3.1.3 Sampling from /(S^|y, /3, a, F) 

The full conditional of comprises n AR(p) Gaussian distributions for and a 
conjugate inverse- Wishart(ns, 5^^) prior, which results in an inverse-Wishart(a, 6) 
posterior distribution with 

a = nY. + n, 

b = isj, + j2iPt-FiPt^i-----FpPt-p){Pt-FiPt^i 
\ t=i 

However, the models applied in section five are based on univariate autoregressive 
processes, for which the conjugate prior simplifies to an inverse-gamma distribution. 
If a non-informative prior is required, an inverse- gamma(e, e) prior with small e 
is typically used. However as discussed in section 2.2, an informative prior may 
be required for S^, and representing informative prior beliefs using a member of 
the inverse-gamma family is not straightforward. The variance parameters of the 
autoregressive processes are likely to be close to zero (to ensure the evolution is 
smooth), so we represent our prior beliefs as a Gaussian distribution with zero mean, 
which is truncated to be positive. The informativeness of this prior is controlled by 
its variance, with smaller values resulting in a more informative distribution. If this 
prior is used, the full conditional can be sampled from using a Metropolis-Hastings 
step with a random walk proposal. 




3.1.4 Sampling from /(F|y, /3, ct, S^) 

The full conditional of F depends on the form and dimension of the AR(p) process, 
and the most common types are univariate AR(1) {(3t ~ N(Fi/3f_i, S/j)) and AR(2) 
(/3i ~ N(Fi/3t_i + F2/3t_2, S/j)) processes. In either case, assigning (Fi) or (Fi,F2) 
flat priors results in a Gaussian full conditional distribution. For example in a uni- 
variate AR(1) process, the full conditional for Fi is Gaussian with mean ^^'""^ ? 

and variance — . Similar results can be found for an AR(2) process. 

4 Modelling air pollution and health data 

As described in the introduction, air pollution and health data are typically modelled 
by Poisson linear or additive models, which are similar to equation (1). The daily 
health counts are regressed against air pollution levels and a vector of covariates, the 
latter of which model long-term trends, seasonal variation and temporal correlation 
commonly present in the daily mortality series. The covariates typically include 
an intercept term, indicator variables for day of the week, and smooth functions of 
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calendar time and meteorological covariates, such as temperature. A large part of 
the seasonal variation is modelled by the smooth function of temperature, while the 
long-term trends and temporal correlation are removed by the smooth function of 
calendar time. The air pollution component typically has the form wtj, which forces 
its effect on health to be constant. Analysing these data with dynamic models allows 
this standard approach to be extended in two ways, both of which are described 
below. 

4.1 Modelling long-term trends and temporal correlation 

The autoregressive nature of a dynamic generalised linear model, enables long-term 
trends and temporal correlation to be modelled by an autoregressive process, rather 
than a smooth function of calendar time. This is desirable because such a process 
sits in discrete time and estimates the underlying trend in the data while 
its smoothness is controlled by a single parameter (the evolution variance). In 
these respects an autoregressive process is a natural choice to model the influence of 
confounding factors because it can be seen as the discrete time analogue of a smooth 
function of calendar time. In the dynamic modelling literature (see for example 
Chatfield (1996) and Fahrmeir and Tutz (2001)), long-term trends are commonly 
modelled by: 

First order random walk /3j N(/3t_i,r^), 
Second order random walk f3t ~ ^C^Pt-i — Pt~2,T'^), (4) 
Local linear trend model (3t ~ N(/?t_i + (5(_i,r^), 

All three processes are non-stationary which allows the underlying mean level to 
change over time, a desirable characteristic when modelling long-term trends. A 
second order random walk is the natural choice from the three alternatives, because 
it is the discrete time analogue of a natural cubic spline of calendar time (Fahrmeir 
and Tutz (2001)), one of the standard methods for estimating the smooth functions. 
Chiogna and Gaetan (2002) also use a second order random walk for this reason, 
but in section five we extend their work by comparing the relative performance 
of smooth functions and each of the three processes listed above. We estimate 
the smooth function with a natural cubic spline, because it is parametric, making 
estimation within a Bayesian setting straightforward. 

4.2 Modelling the effects of air pollution 

The effects of air pollution are typically assumed to be constant (represented by 7), 
or depend on the level of air pollution, the latter of which replaces wtj in (1) with a 
smooth function f{wt\X). This is called a dose-response relationship, and higher pol- 
lution levels typically result in larger adverse effects. Comparatively little research 
has allowed these effects to evolve over time, and any temporal variation is likely to 
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be seasonal or exhibit a long-term trend. Seasonal effects may be caused by an inter- 
action with temperature, or with another pohutant exhibiting a seasonal pattern. In 
contrast, long-term trends may result from a slow change in the composition of harm- 
ful pollutants, or from a change in the size and structure of the population at risk 
over a number of years. The only previous studies which investigate the time- varying 
effects of air pollution are those of Peng et al. (2005) and Chiogna and Gaetan (2002), 
who model the temporal evolution as 7^ = + ^1 sin(27rt/365) -|-02 cos(27ri/365) and 
7t ^ N(7t_i, cr^) respectively. The seasonal form is restrictive because it does not 
allow the temporal variation to exhibit shapes which are not seasonal. In contrast, 
the first order random walk used by Chiogna and Gaetan (2002) does not fix the 
form of the time-varying effects a-priori, allowing their shape to be estimated from 
the data, which results in a more realistic model. In section five we also model 
this temporal variation with a first order random walk, because of its fiexibility and 
because it allows a comparison with the work of Chiogna and Gaetan (2002). 

5 Case study analysing data from Greater London 

The extensions to the standard model described in section 4 are investigated by 
analysing data from Greater London. The first subsection describes the data that 
are used in this case study, the second discusses the choice of statistical models, 
while the third presents the results. 

5.1 Description of the data 

The data used in this case study relate to daily observations from the Greater London 
area during the period 1^^ January 1995 until 31^^ December 1997. The health data 
comprise daily counts of respiratory mortality drawn from the population living 
within Greater London, and are shown in Figure 1. A strong seasonal pattern is 
evident, with a large increase in the number of deaths during the winter of 1996/1997. 
The cause of this peak is unknown, and research has shown no influenza epidemic 
during this time (which has previously been associated with large increases of this 
type (Griffin and Neuzil (2002)). The air pollution data comprise particulate matter 
levels, which are measured as PMio at eleven locations across Greater London. To 
obtain a single measure of PMio, the values are averaged across the locations, a 
strategy which is commonly used in studies of this type(see for example Katsouyanni 
et al. (1996) and Samet et al. (2000)). For these data this strategy is likely to 
introduce minimal additional exposure error, because PMio levels in London between 
1994 and 1997 exhibit little spatial variation (Shaddick and Wakefield (2002)). In 
addition to the health and pollution data, a number of meteorological covariates 
including indices of temperature, rainfall, wind speed and sunshine, are measured at 
Heathrow airport. However, in this study only daily mean temperature, measured 
in Celsius C^C), is a significant covariate and the rest are not used. 

5.2 Description of the statistical models used 

Dynamic generalised linear models extend the standard approach to analysing these 
data by: (i) allowing the trend and temporal correlation in the health data to be 
removed with an autoregressive process; (ii) allowing the effects of air pollution to 
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evolve over time. To investigate these two extensions eight models are applied to 
the Greater London data, and a summary is given in Table 1. The general form of 
all eight models is given by 



yt ~ Poisson(;Uf) for t = 1, . . . , re, 
In(^j) = PMiO(_i7t +/?t + S'(temperaturej|3,Q;3), (5) 
a ~ N(/x„,S„), 

where Pt is the trend component, and 7t represents the effect of PMio on day t. 
The trend component is represented by one of four sub-models, denoted by (a) - (d) 
below, which take the form of a natural cubic spline of calendar time or one of the 
three autoregressive processes given in (4). 

(a) Natural cubic splines (b) First order random walk 



Pt = ai + S{t\k,a2) A~N(/3i„i,r2) 

I3q ~ N(3.5, 10) 



N(0,52)/[r2>0] 



(c) Second order random walk (d) Local linear trend 

/3_i,/3o~N(3.5,10) 6tr^N{6t-i,i'^) 
r2~N(0,<72)/[r2>o] /3o ~ N(3.5, 10) 

6o ~ N(0, 10) 

r2 ~N(0,ff4)7[r2>0] 

tp^ ~ N(0,55)/[^2>0] 

The effects of air pollution are represented by one of two components, denoted by 
(i) and (ii) below, which force these effects to be constant or allow them to evolve 
over time. 

(i) Constant (ii) Time-varying - first order random walk 

7t = 7 7t ~ N(7t_i,cr^) 

70 - N(0, 10) 

~N(0,5i)j[„2>o] 

In the model description above, N(0, (/i)7[o-2>o] denotes a truncated Gaussian dis- 
tribution where /[] is an indicator function which specifies the range of allowed 
(non-truncated) values. The smooth functions S{var\df,a) are estimated with nat- 
ural cubic splines, where var is the covariate and df is the degrees of freedom. The 
vector of fixed parameters is different for each model, and includes the intercept, 
the parameters that make up the natural cubic splines, and the constant effect of 
air pollution. To compare the results with those presented by Chiogna and Gaetan 
(2002), each model is analysed within the Bayesian approach described here and 
their likelihood based alternative. Likelihood based analysis is carried out using 
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the iteratively re-weighted Kalman filter and smoother proposed by Fahrmeir and 
Wagenpfeil (1997), while the hyperparameters are estimated using Akaike Informa- 
tion Criterion (AIC). The remainder of this subsection describes the model building 
process, including justifications for the choice of models. The first part focuses on 
the trend models, while the second discusses the air pollution component. 

5.2.1 Modelling trends, seasonal variation and temporal correlation 

The model building process began by removing the trend, seasonal variation and 
temporal correlation from the respiratory mortality series. These data exhibit a 
pronounced yearly cycle, which is partly modelled by the trend component (3t, and 
partly by daily mean temperature (also has a yearly cycle). The latter was added to 
the model at a number of different lags with different shaped relationships, and the 
fit to the data was assessed using the deviance information criterion (DIG, Spiegelal- 
ter et al. (2002)). As a result, a smooth function of the same days temperature with 
three degrees of freedom is used in the final models, because it has the lowest DIG, 
and has previously been shown to have a U-shaped relationship with mortality (see 
for example Dominici et al. (2000)). The smooth function is modelled with a natural 
cubic spline, because it is fully parametric making analysis within a Bayesian setting 
straightforward. 

The smooth function of calendar time (trend component (a)) is modelled by a natu- 
ral cubic spline for the same reason, and has previously been used by Daniels et al. 
(2004)). The smoothness of the spline is chosen by DIG to be 27, and is fixed prior 
to analysis. To allow a fairer comparison with the other trend components, the de- 
grees of freedom should be estimated simultaneously within the MGMG algorithm, 
but this makes the average trend impossible to estimate. As the smoothness of the 
spline is fixed, its parameters (part of ct) are given a non-informative Gaussian prior. 
In the Likelihood analysis, the smoothing parameter is chosen by minimising AIG 
which also leads to 27 degrees of freedom. 

The remaining three trend models are based on autoregressive processes, and their 
smoothness is controlled by the evolution variances {r'^,ip'^). Initially, these vari- 
ances were assigned non-informative inverse-gamma(0.01, 0.01) priors, but the esti- 
mated trends (not shown) just interpolates the data. This undesirable aspect can 
be removed by assigning (t^,'0^) informative priors, which shrink their estimates 
towards zero producing a smoother trend. The choice of an informative prior within 
the inverse- gamma family is not straightforward, and instead we represent our prior 
beliefs as a Gaussian distribution with mean zero, which is truncated to be positive. 
This choice of prior forces (r^,')/'^) to be close to zero, with the prior variances, 
denoted by {92, 93, 94, 95), controlling the level of informativeness. Smaller prior 
variances results in more prior weight close to zero, forcing the estimated process to 
be smoother. 

It seems likely that the trend in mortality will be similar on consecutive days, mean- 
ing that the autoregressive process should evolve smoothly over time. The trend 
is modelled on the linear predictor scale, which corresponds to the natural log of 
the data and has a range between 2.5 and 4.5 daily deaths (between about 12 and 
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90 on the un- logged scale). On that scale, a jump of 0.01 on consecutive days is 
approximately the largest difference that cannot be detected by the eye, resulting 
in a visually smooth trend. To relate this to the choice of {§2, 93, 94., 95), each of the 
three processes were simulated with a variety of variances, and the average absolute 
difference between consecutive values was calculated. The variances were chosen so 
that 50% of the prior mass was below the threshold value that gave average differ- 
ences of 0.01, resulting in §2 = 10"'', = 10~^^. The local linear trend model has 
two variance parameters, and it was found that both needed to be tightly controlled 
for the process to evolve smoothly, resulting in 94 = 95 = 10~^^. Sensitivity analyses 
were carried out for different values of {92, 93, 94, 95), but it was found that larger 
values resulted in trends that were not visually smooth. In the likelihood analysis, 
the variance parameters are chosen by optimising AIC. The priors for the initialis- 
ing parameters /3o, <5o) are non- informative Gaussian distributions with mean 
equal to zero for the rate 5o, and 3.5 for (/3_i,/3o), the average of mortality data 
from previous years on the logged scale. 

5.2.2 Modelling the effects of PMio 

After modelling the influence of unmeasured risk factors, the effects of PMio at a 
number of different lags were investigated. A lag of one day is used in the final 
models, because it has the minimum DIG and has been used in other recent studies 
(see for example Dominici et al. (2000), and Zhu et al. (2003)). Gonstant and time- 
varying effects of PMio a-re investigated in this case study, with the latter modelled 
by a first order random walk, which allows a comparison with the work of Ghiogna 
and Gaetan (2002). Initially, a non-informative inverse- gamma(0.01, 0.01) prior was 
specified for the variance of the random walk (denoted by o"^), but the estimated 
time- varying effects (not shown) are contaminated by noise and an underlying trend 
cannot be seen. These effects are likely to evolve smoothly over time, and to en- 
force this smoothness o"^ is assigned an informative zero mean Gaussian prior which 
is truncated to be positive. The informativeness is controlled by the variance gi, 
which is chosen using an identical approach to that described above. In this case 
the likely range of effects is -0.003 to 0.005, and the largest difference that is unde- 
tectable by the eye is around 0.00005, leading to gi = 10~^^. To corroborate this 
choice a sensitivity analysis was carried out for different values of gi , which showed 
the evolution was smooth for values as large as 10"^''. As this is less informative 
than 10~^^, it is used in the final models. In the likelihood analysis, the variance 
parameter is chosen by optimising the AIG. 

5.3 Results 

The models contain a large number of parameters, so to aid convergence the covari- 
ates (PMio smd the basis functions for the natural cubic splines of calendar time 
and temperature) are standardised to have a mean of zero and a standard deviation 
of one before inclusion in the model (and are subsequently back-transformed when 
obtaining results from the posterior distribution). The Markov chains are burnt in 
for 40,000 iterations, by which point convergence was assessed to have been reached 
using the methods of Gelman et al. (2003). At this point a further 100,000 iterations 
are simulated, which are thinned by 5 to reduce autocorrelation, resulting in 20,000 
samples from the joint posterior distribution. 
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5.3.1 Results for the four trend models (3t 



Long-term trends, overdispersion and temporal correlation are removed from the 
health data with one of four trend models: a natural cubic spline of calendar time 
(models 1 and 2); a first order random walk (models 3 and 4); a second order random 
walk (models 5 and 6); and a local linear trend model (models 7 and 8). To aid 
clarity in the following discussion, these approaches are compared and contrasted 
assuming a constant effect of air pollution (using the odd numbered models). Figure 
1 shows the health data from Greater London, together with the estimated trends 
from the Bayesian (solid lines) and likelihood (dotted lines) analyses. Panel (a) 
shows the estimated trend from a natural cubic spline of calendar time, panel (b) 
relates to the first order random walk, panel (c) to the second order random walk, 
and panel (d) to the local linear trend model. All four models capture the under- 
lying trend in the health data well, and the Bayesian and likelihood estimates are 
very similar. The only major differences between the eight estimates are in the win- 
ters of 1996 and 1997, where the respiratory mortality data has yearly peaks. For 
each trend model, the Bayesian estimate captures the height of these peaks better 
than its likelihood counterpart, while the second order random walk outperforms the 
other three alternatives. For example, in the winter of 1997 the maximum number 
of deaths on a single day is 145, and the Bayesian estimates of this peak are, (a) - 
98.4, (b) - 92.2, (c) - 107.6, (d) - 101.5, while the corresponding likelihood values 
are, (a) - 69.4, (b) - 79.1, (c) - 85.9, (d) - 85.1. These figures show that the second 
order random walk is the most adept at modelling these peaks, while the local linear 
trend model outperforms both the natural cubic spline and first order random walk. 

All eight estimates have the same visual smoothness, and a summary of the smooth- 
ing parameters is given in Table 2. For the natural cubic spline model, which does 
not estimate the number of basis functions as part of the MCMC algorithm (it is 
estimated by DIG), the estimate of k is identical in both analyses (it is estimated 
by GGV in a likelihood analysis). However, for the remaining three analyses, the 
likelihood estimates of the smoothing parameters (r^) are significantly larger than 
their Bayesian counterparts, without the corresponding trends being less smooth. 
This is unexpected, and is most likely caused by differences in the techniques used 
to estimate the autoregressive processes, a point which is taken up in the discussion. 

To examine how effective each trend model is at removing temporal correlation 
from the health data, a measure of the residuals is required. In a Bayesian setting 
residuals are not well defined (see Pettit (1986)), because there is no natural point 
estimate for the parameters. Instead, a 'residual distribution' can be generated for 
each ut by simulation. For example, a Pearson type distribution has a jth 'realised 
residual' 



in which 0^^' is the jt/i sample from the joint posterior distribution. The residual 
distribution takes into account the uncertainty in 9, and residuals based on point 
estimates are approximations to this distribution. Figure 2 shows the autocorrelation 
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function of an approximation to this residual distribution, which is based on posterior 
medians. The second order random walk again outperforms the other approaches, 
showing little or no correlation in the approximate Pearson residuals. In contrast, 
the natural cubic spline is the worst of the four trend components, having significant 
correlation at the first four lags. The remaining two models perform similarly, and 
only show significant correlation at the first lag. The residuals from the likelihood 
analyses (not shown) show a similar comparison between the four approaches, but 
exhibit greater correlation than those from the Bayesian analysis, suggesting that 
the Bayesian models are superior. A plot of the residuals against time showed little 
difference between the four approaches and is not shown. 

5.3.2 Results for the time-varying effects of air pollution 7t 

The models presented here allow the effects of air pollution to evolve over time as 
a first order random walk, or be fixed at a constant value. In the graphs and tables 
that follow, these effects are given as a relative risk for an increase in 10 units of 
PMiQ. This is calculated as the ratio of expected number of deaths, fif^^ / fit, where 
fif^^ is the expected number of deaths if the air pollution level had risen by 10 
units. The relative risk is given by exp(107) (for models 1,3,5 and 7) and a value of 
1 represents no effect of air pollution. 

(a) - Constant effects 

Table 3 shows the estimated relative risks from models 1, 3, 5 and 7, which force 
the effects of air pollution to be constant. All eight Bayesian and likelihood esti- 
mates are very similar (range from 1.007 to 1.015), suggesting that the method of 
analysis and the choice of trend component do not affect the estimated health risk. 
The estimates from the likelihood analyses are always larger than those from the 
corresponding Bayesian model, although the differences are not large. The Bayesian 
credible intervals are wider than their likelihood counterparts, and few of the in- 
tervals contain one, suggesting that exposure to PMio has a statistically significant 
effect on mortality. 

(b) - Time-varying effects 

In the likelihood analyses the estimated variance parameters are all zero, forcing 
the time- varying effect to be constant. However, in the Bayesian analyses these 
estimates are greater than zero, and the time- varying effects are shown in Figure 3. 
The evolution in the effects is smooth, which is a result of the informative prior placed 
on the variance o"^. All four estimates are very similar, suggesting that the choice of 
model for the unmeasured risk factors does not affect the substantive conclusions. 
The effects exhibit a slowly increasing long-term trend, which has ranges of: (a) 
1.005 to 1.015; (b) 1.007 to 1.014; (c) 1.002 to 1.019; (d) 0.999 to 1.024. The 95% 
credible intervals for panels (a) and (b) (models 2 and 4 respectively) are of a similar 
size, but the remaining two exhibit substantial additional variation, especially in 
panel (d). This additional variation is not supported by the same pattern in the 
credible intervals for the constant effects, a point which is taken up in the discussion. 
However, the width of the four intervals in panels (a) to (d) suggest that a constant 
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effect of PMio cannot be ruled out. 

6 Discussion 

This paper proposes the use of Bayesian dynamic generahsed hnear models to es- 
timate the relationship between air pollution exposure and mortality or morbidity. 
The majority of air pollution and health studies fix the effects of air pollution to 
be constant over time, and model long-term trends and temporal correlation in the 
health data using a smooth function of calendar time. The DGLM framework al- 
lows autoregressive processes to be used for both these factors, the first of which 
allows the effects of air pollution change over time. A Bayesian approach is assumed 
throughout with analysis based on MCMC simulation. In addition a likelihood anal- 
ysis is also presented, which allows a comparison with the only previous air pollution 
and health study that used a DGLM. 

The results from the four trend models lead to two main conclusions. Firstly, al- 
though all four trend components capture the underlying level of daily deaths rel- 
atively well, the standard approach of using smooth functions is outperformed by 
the autoregressive processes. In particular, the best of these processes is the second 
order random walk, because its residuals exhibit no correlation, and the two winter 
peaks in daily mortality are well represented. In contrast, the smooth function leaves 
significant correlation in the residuals, while the estimated peaks are captured less 
well. The local linear trend model also performs better than the smooth function, 
but the first order random walk gives similar results. The poor performance of the 
smooth function is most likely caused by the way it is estimated, which includes the 
choice of smoothing parameter and the use of natural cubic splines. The smooth 
function's degrees of freedom is estimated by DIG and is fixed during the simulation, 
which is in contrast to the autoregressive processes whose smoothing parameters are 
estimated within the MGMG algorithm. As a result, the estimated autoregressive 
trends incorporate the variation in their smoothing parameter, which is not the 
case for the smooth function and may account for the latter's poorer performance. 
Another possible cause of the smooth function's poorer performance is the use of 
natural cubic splines to estimate it. Regression splines were used here because of 
their parametric make-up, but are known to be less flexible than non-parametric 
alternatives. An interesting area of future research would be to compare the per- 
formances of the trend models used here, against non-parametric smooth functions, 
such as smoothing splines or LOESS smoothers. 

Secondly, the Bayesian approach gives results that are superior to the likelihood 
analysis, both in terms of removing temporal correlation from the health data, and 
its ability to capture winter peaks in mortality. The estimated smoothing parame- 
ters for the Bayesian and likelihood implementations of the natural cubic splines are 
obtained by optimising data driven criteria (DIG and AIG), and it is not surpris- 
ing that both estimates are identical. However, for the autoregressive processes the 
Bayesian estimates are smaller than their likelihood counterparts, which is caused 
by the relative strengths of the truncated Gaussian prior and the penalty term in 
the AIG criteria. A sensitivity analysis shows that such a strong prior is required, 
because using a non-informative prior for results in the estimated trend interpolat- 
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ing the data. An initial comparison of the estimated smoothing parameters (r^) for 
the Bayesian and likehhood analyses, shows that the latter were larger and therefore 
might be expected to produce a trend exhibiting greater variability (and thus model 
the data at the peaks more accurately). However, the opposite was observed, and 
the larger estimates of in the likelihood analyses result in trends which are less 
variable. This apparent anomaly is most likely caused by differences in the methods 
used to implement the autoregressive constraint for (3. In the Bayesian analysis, this 
is implemented through the specification of an autoregressive prior /(/3), whereas 
the likelihood approach enforces the autoregressive constraint using the Kalman fil- 
ter. The filter uses a two stage process which firstly estimates E[/3Jyi, . . . , yj], for all 
t, and then smoothes the results by estimating E[/3j|?/i, . . . , The final likelihood 
estimates are based on these smoothed values, and it is this additional smoothing 
imposed by the Kalman filter, that reduces the variability in the estimated trends, 
which over smoothes the data in this case. 

The Bayesian estimates of the pollution-mortality relationship exhibit a consistent 
long-term pattern regardless of the choice of trend model, suggesting that this tem- 
poral variation should be investigated further. However, no seasonal interaction is 
observed, meaning that the model of Peng et al. (2005) is too restrictive for these 
data. The informative prior for cr^ forces these effects to evolve smoothly over time, 
while a sensitivity analysis showed that using a non-informative prior leads to the 
estimate being contaminated with noise. This noise is caused by the excess number 
of parameters used to model the time- varying effects, which makes these parameters 
non-identifiable. A non-informative prior for cr^ is too weak for these data, and the 
specification of an informative prior shrinks the evolution variance towards zero, 
effectively reducing the number of parameters. The resulting temporal evolution is 
smooth, but this is achieved at the expense of a very informative prior. 

The estimated time-varying effects are not altered by the choice of trend model, 
although the credible intervals increase in width if a second order random walk or 
local linear trend are used. These two represent the most flexible trend models, 
and their increased variation may cause slight non-identifiability or collinearity with 
the time- varying effects of PMio, reducing their precision. The estimated temporal 
variation from the Bayesian models exhibit a similar shape to those reported by 
Chiogna and Gaetan (2002) in Birmingham Alabama, using a likelihood approach. 
However, this contrasts with our likelihood based analyses which forced the effects 
to be constant and not exhibit any temporal variation. The difference in curvature 
between our Bayesian and likelihood analyses is again due to the way the smoothing 
parameters are estimated. The likelihood approach calculates the likelihood for a 
range of values of the smoothing parameter, and estimates o"^ by optimising a data 
driven criterion. In contrast, the Bayesian approach averages over the posterior for 
(T^, which incorporates the possibility of no smoothing, thus leading to an estimate 
which exhibits greater curvature. 
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Appendix A - simulation of (3 



The first p parameters are updated separately from /3i, . . . because their full 
conditional distribution does not depend on y and is a standard Gaussian distribu- 
tion. In contrast, . . . ,/3„ are sampled using a block Metropolis-Hastings scheme, 
in which the proposal distribution is based on the autoregressive prior. Ignoring 
/3_p+i, • • • , /3o which have already been sampled, the autoregressive prior can be 
written as a singular multivariate Gaussian distribution 



/(/31F,S^) = nN(/3,|Fi/3,_i + ... + Fp/3,„p,S^) 

1 



oc exp \^-^0^Kp 

with mean zero and singular precision matrix K. The precision matrix is given by 

-^-p+l,-p+l • • • -^-p+l,n 



K 



K.. 



n,— p+l 



{n+p)qx{n+p)q 



where Kt^t is a q x q block relating to /3^. The blocks depend on the order of 
the AR(p) process, and K has a bandwidth of p blocks (all blocks Kij, for which 
N ~ j\ ^ P zero). For example, an AR(1) process leads to 



F^E-iFi t = 

FTs-1Fi + S-i t = l,...,n-l 



Kt,t+i 
Kt,t-i 



Vi, 
V t. 



t = n 



The parameters are updated in blocks of size g, which is used as a tuning parameter 
to achieve the desired acceptance rates. The proposal distribution for a block ,, = 
(/3r, . . . , /3Jggxi, in which s = r + g-lis given by 

where l3_rs denotes all elements of (3 except P^.^. The mean and variance are given 
by 



Mr 



~^r,s^-p+l,r-lP~p+l,r~l 
-K^ g Ks+l^nPs+l^n 



if s=n 
if r=-p+l 
otherwise 



^'■r.s 1 



which was calculated using standard properties of the multivariate Gaussian distri- 
bution. In this calculation the precision matrix is decomposed into 
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^-p+l,r-l 
K = I Kr^s Ks+l,n 

where Kr^s is the square gq x gq matrix containing blocks Kr^r to Ks^s- The remain- 
ing two blocks are rectangular, contain the same rows as Kr^g, and include all the 
remaining columns. To avoid any mixing problems at the boundaries of each block, 
the length of the first block can be randomly generated from the set {q, 2q, . . . , gq}. 
The acceptance probability of a move from /3:^~^ to /3* ^ is given by 

min / 1 n?=r Poisson(?/i|/3i*, a 



Further details can be found in Knorr-Held (1999). 
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Table 1: Summary of the eight models. The base model is given by (5). 



Model 


Trend Pt 


Air pollution effect 7^ 


1 


(a) - splines 


(i) - constant 


2 


(a) - splines 


(ii) - random walk 


3 


(b) - first order random walk 


(i) - constant 


4 


(b) - first order random walk 


(ii) - random walk 


5 


(c) - second order random walk 


(i) - constant 


6 


(c) - second order random walk 


(ii) - random walk 


7 


(d) - local linear trend 


(i) - constant 


8 


(d) - local linear trend 


(ii) - random walk 



Table 2: Summary of the smoothing parameters from the Bayesian and likelihood 
analyses. 



Model 


Parameter 




Bayesian 




Likelihood 






2.5% 


median 


97.5% 




1 


k 


NA 


27 


NA 


27 


2 


k 


NA 


27 


NA 


27 






9.91x10"^ 


1.10x10-'^ 


1.21x10-^^ 





3 




0.00018 


0.00019 


0.00021 


0.373 


4 




0.00018 


0.00019 


0.00021 


0.373 






1.01x10"'^ 


l.lOxlO"'^ 


1.20x10-^^ 





5 




2.20x10"*^ 


3.56x10^*^ 


5.78x10-6 


0.004 


6 




2.27x10"*^ 


3.67x10"*^ 


5.95x10-6 


0.004 






1.19x10"^ 


8.25x10"'^ 


1.66x10-6 





7 




1.61x10"^ 


1.68x10"^ 


1.76x10-*" 


10-'^ 






3.00x10"^ 


3.09x10-6 


3.18x10-6 


0.003 


8 




1.58x10"^ 


1.67x10-'' 


1.77x10-'^ 


10-'^ 






3.05x10-6 


3.17x10-6 


3.25x10-6 


0.003 






7.43 xlO"'^ 


2.64x10-6 


5.44x10-6 






Table 3: Relative risks for an increase in IQ^g/m? of PMio and corresponding 95% 
credible or confidence intervals. 



Model 


Bayesian 


Likelihood 


1 


1.007 (0.998 , 1.016) 


1.015 (1.010 , 1.020) 


3 


1.011 (1.002 , 1.020) 


1.014 (1.008 , 1.019) 


5 


1.008 (0.999 , 1.017) 


1.014 (1.008 , 1.019) 


7 


1.009 (1.000 , 1.019) 


1.013 (1.007 , 1.018) 
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Figure 1: The health data and the estimated trends from the four approaches: (a) 
natural cubic spline; (b) first order random walk; (c) second order random walk; (d) 
local linear trend. Bayesian and likelihood estimates are represented by solid and 
dashed lines respectively. 
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(c) - Model 5 












(d) - Model 7 






deaths 


100 140 










deaths 


100 140 
1 1 1 




f 






o 


o 










ler of 


o 










E 


!D 










E 












z 


O _ 










z 


o _ 












01.01,1995 


1 1 

01/01/1996 01/01/1997 


31/12/1997 




01/01/1995 


1 1 

01/01/1996 01/01/1997 


1 

31/12/1997 








Time in days 












Time in days 







24 



Figure 2: Autocorrelation function of the Bayesian residuals for different type of 
trends: (a) natural cubic spline; (b) first order random walk; (c) second order random 
walk; (d) local linear trend. 
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Figure 3: Estimated time-varying effects (solid line) of PMio from the Bayesian 
analyses with 95% credible intervals. The dotted line represents a constant effect 
over time. The four panels relate to the different trend models: (a) natural cubic 
spline; (b) first order random walk; (c) second order random walk; (d) local linear 
trend. 
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(c) - Model 6 
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